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Search is a major technique for planning. It amounts to exploring a state space of planning 
domains typically modeled as a directed graph. However, prohibitively large sizes of the search 
space make search expensive. Developing better heuristic functions has been the main technique 
for improving search efficiency. Nevertheless, recent studies have shown that improving heuristics 
alone has certain fundamental limits on improving search efficiency. Recently, a new direction 
of research called partial order based reduction (POR) has been proposed as an alternative to 
improving heuristics. POR has shown promise in speeding up searches. 

POR has been extensively studied in model checking research and is a key enabling technique 
for scalability of model checking systems. Although the POR theory has been extensively studied 
in model checking, it has never been developed systematically for planning before. In addition, 
the conditions for POR in the model checking theory are abstract and not directly applicable 
in planning. Previous works on POR algorithms for planning did not establish the connection 
between these algorithms and existing theory in model checking. 

In this paper, we develop a theory for POR in planning. The new theory we develop connects 
the stubborn set theory in model checking and POR methods in planning. We show that previous 
POR algorithms in planning can be explained by the new theory. Based on the new theory, we 
propose a new, stronger POR algorithm. Experimental results on various planning domains show 
further search cost reduction using the new algorithm. 

Categories and Subject Descriptors: 1.2.8 [Artificial Intelligence]: Problem Solving, Control 
Methods, and Search — Graph and tree search strategies 

General Terms: AI planning, State-space search, Partial order reduction, Stubborn set 



1. INTRODUCTION 

State space search is a fundamental and pervasive approach to artificial intelligence 
in general and planning in particular. It is among the most successful approaches 
to planning. A major concern with state space search is that it has a high time and 
space cost since the state space that needs to be explored is usually very large. 

Much research on classical planning has focused on the design of better heuristic 
functions. For example, new heuristic functions have recently been developed by 
analyzing the domain transition graphs (DTGs) and causal graphs on top of the 
SAS+ formalism [Briel et al. 2007; Helmert and Roger 2008]. Despite the suc- 
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cess of using domain-independent heuristics for classic planning, heuristic planners 
still face scalability challenges for large-scale problems. As shown by recent work, 
search even with almost perfect heuristic guidance may still lead to very high search 
cost [Hclmcrt and Roger 2008]. Therefore, it is important to improve other compo- 
nents of the search algorithm that are orthogonal to the development of heuristics. 

Recently, partial order based reduction (POR), a new way to reduce the search 
cost from an orthogonal perspective, has been studied for classical planning [Chen 
ct al. 2009; Chen and Yao 2009]. POR as a method to reduce search space has been 
extensively studied in model checking with solid theoretical investigation. However, 
the theoretical properties of POR in planning have still not been fully investigated. 
There are three key questions. 

1) POR algorithms have been extensively studied in model checking. In fact, 
POR is an enabling technique for modeling checking, which will not be practi- 
cal without POR due to its high time complexity. Extensive research has been 
developed for the theory of POR in model checking. What are the relationships 
between the previous POR methods designed for model checking and existing work 
for planning? Understanding these relationships can not only help us understand 
both problems better, but can also potentially lead to better POR algorithms for 
planning. 

2) In essence, all POR based algorithms reduce the search space by restricting 
certain actions from expanding at each state. Although these POR algorithms all 
look similar, what are the differences in the quality of reduction that significantly 
affect search efficiency? We think it is important to investigate the reduction powers 
of different POR algorithms. 

3) Given the fact that there is more than one POR reduction algorithm for 
planning, are there other, stronger POR algorithms? To answer this question, 
in essence, we need to find the sufficient and/or necessary conditions for partial- 
order based pruning. There are sufficient conditions for POR in model checking. 
Nevertheless, those conditions are abstract and not directly applicable in planning. 

The main contribution of this work is to establish the relationship between the 
POR methods for model checking and those for planning. We leverage on the exist- 
ing POR theory for model checking and develop a counterpart theory for planning. 
This new theory allows existing POR algorithms for planning to be explained in 
a unified framework. Moreover, based on the conditions given by this theory, we 
develop a new POR algorithm for planning that is stronger than previous ones. Ex- 
perimental results also show that our proposed algorithm leads to more reduction. 

This paper is organized as follows. We first give basic definitions in Section 2. 
In Section 3, we present a general theory that gives sufficient conditions for POR 
in planning. In Section 4, we use the new theory to explain two previous POR 
algorithms. Based on the theory, in Section 5, we propose a new POR algorithm for 
planning which is different and stronger than previous ones. We report experimental 
results in Section 7, review some related work in Section 8, and give conclusions in 
Section 9. 
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2. BACKGROUND 

Planning is a core area of artificial intelligence. It entails arranging a course of 
actions to achieve certain goals under given constraints. Classical planning is the 
most fundamental form of planning, which deals with only prepositional logic. In 
this paper, we work on the SAS+ formalism [Jonsson and Backstrom 1998] of 
classical planning. SAS+ formalism has recently attracted a lot of attention due 
to a number of advantages it has over the traditional STRIPS formalism. In the 
following, we review this formalism and introduce our notations. 

Definition 1. A SAS+ planning task H is defined as a tuple of four ele- 
ments, II = {X, O, S, si,sg}- 

— X = {xi, ■ ■ ■ ,xn} is a set of multi-valued state variables, each with an asso- 
ciated finite domain Domixi). 

— O is a set of actions and each action o G O is a tuple (pre(o), eff(o)), where both 
pre(o) and eff{6) define some partial assignments of state variables in the form 
Xi — Vi,Vi G Dom(xi). sg is a partial assignment that defines the goal. 

— S is the set of states. A state s G S is a full assignment to all the state variables, 
si G S is the initial state. A state s is a goal state if sg C s. 

Definition 2. Two partial assignment sets are conflict-free if and only if they 
do not assign different values to the same state variable. 

For a SAS+ planning task, for a given state s and an action o, when all variable 
assignments in pre{p) are met in state s, action o is applicable in state s. After 
applying o to s, the state variable assignment will be changed to a new state s' 
according to eff(o): the state variables that appear in eff(o) will be changed to 
the assignments in eff(o) while other state variables remain the same. We denote 
the resulting state after applying an applicable action o to s as s' = apply(s,o). 
apply(s, o) is undefined if o is not applicable in s. The planning task is to find a 
path, or a sequence of actions, that transits the initial state sj to a goal state that 
includes sg- 

An important structure for a given SAS+ task is the domain transition graph 
defined as follows: 

Definition 3. For a SAS+ planning task, each state variable xi (i = 1, ■ • ■ , N) 
corresponds to a domain transition graph (DTG) Gi, a directed graph with a 
vertex set V{Gi) = Dom(xi) U vq, where vq is a special vertex, and an edge set 
E(Gi) determined by the following. 

— // there is an action o such that (x; = Vi) G pre(o) and (xi = v[) G eff(o), 
then (vi^v'A belongs to E(Gi) and we say that o is associated with the edge 
ph = (^ij^i) (denoted as oh Ci). It is conventional to call the edges in DTGs 
transitions. 

— // there is an action o such that {xi = v[) G eff{o) and no assignment to Xi is in 
pre{6), then [vQ,v'f) belongs to E(Gi) and we say that o is associated with the 
transition = (vq, v[) (denoted as oh e,J. 
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Intuitively, a SAS+ task can be decomposed into multiple objects, each corre- 
sponding to one DTG, which models the transitions of the possible values of that 
object. 

Definition 4. For a SAS+ planning task, an action o is associated with a 
DTG Gi (denoted as oh Gi) if eff(o) contains an assignment to 

Definition 5. For a SAS+ planning task, a DTG Gi is goal-related if the 
partial assignments in sg that define the goal states include an assignment X{ = gi 
in Gi. A goal-related DTG is unachieved in state s if Xi = Vi in s and Vi ^ gi. 

A SAS+ planning task can also specify a preference that needs to be optimized. 
A preference is a mapping from a path p to a numerical value. In this paper we 
assume an action set invariant preference. A preference is action set invariant 
if two paths have the same preference whenever they contain the same set of actions 
(possibly in different orders). Most popular preferences, such as plan length and 
total action cost, are action set invariant. 

3. PARTIAL ORDER REDUCTION THEORY FOR PLANNING 

Partial order based reduction (POR) algorithms have been extensively studied for 
model checking [Varpaaniemi 2005; Clarke ct al. 2000], which also requires exam- 
ining a state space in order to prove certain properties. POR is a technique that 
allows a search to explore only part of the entire search space and still maintain 
completeness and/or optimality. Without POR, model checking would be too ex- 
pensive to be practical [Holzmann 1997]. However, POR has not been studied 
systematically for planning. 

In this section, we will first introduce the concept of search reduction. Then, we 
will present a general POR theory for planning, which gives sufficient conditions 
that guide the design of practical POR algorithms. 

3.1 Search reduction for planning 

We first introduce the concept of search reduction. A standard search, such as 
breath-first search (BFS) , depth-first search, or A* search, needs to explore a state 
space graph. A reduction algorithm is an algorithm that reduces the state space 
graph into a subgraph, so that a search will be performed on the subgraph instead 
of the original one. We first define the state space graph. In our presentation, 
for any graph G, we use V{G) to denote the set of vertices and E(G) the set of 
edges. For a directed graph G, for any vertex s G V(G), a vertex s' £ V(G) is its 
successor if and only if (s, s') £ E(G). 

For a SAS+ planning task, a state space graph for the task is a directed graph 
Q in which each state s is a vertex and each directed edge (s, s') represents an action 
that will be explored during a search process. Most search algorithms work on the 
original state space graph as defined below. 

Definition 6. For a SAS+ planning task, its original state space graph is a 

directed graph Q in which each state s is a vertex and there is a directed edge (s, s') 
if and only if there exists an action o such that apply (s, o) = s'. We say that action 
o marks the edge (s,s'). 
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Definition 7. For a SAS+ planning task, for a state space graph Q , the suc- 
cessor set of a state s , denoted by succg(s), is the set of all the successor states 
of s. The expansion set of a state s, denoted by expandg(s), is the set of actions 

expandg(s) = {o | o marks (s, s ), (s, s ) G E(Q)}. 

Intuitively, the successor set of a state s includes all the successor states that 
shall be generated by a search upon expanding s, while the expansion set includes 
all the actions to be expanded at s. 

In general, a reduction method is a method that maps each input state space 
graph Q to a subgraph of Q. The POR algorithms we study remove edges from 
Q. More specifically, each state s is only connected to a subset of all its successors 
in the reduced subgraph. We note that, by removing edges, a POR algorithm 
may also reduce the number of vertices that arc reachable from the initial state, 
hence reducing the number of nodes examined by a search. The decision whether 
a successor state s' would still be a successor in the reduced subgraph can be 
made locally by checking certain conditions related to the current state and some 
precomputed information. Hence, a POR algorithm can be combined with various 
search algorithms. 

For a SAS+ task, a solution sequence in its state space graph Q is a pair 
(s°,p), where s° is a non-goal state, p = (ai, . . . , ak) is a sequence of actions, and, 
let s % = apply (s 1 " 1 , a,), i = 1, . . . , k, (s i_1 , s l ) is an edge in Q for i = 1, . . . , k and 
s k is a goal state. We now define some generic properties of reduction methods. 

Definition 8. For a S AS '+ planning task, a reduction method is completeness- 
preserving if for any solution sequence (s°,p) in the state space graph, there also 
exists a solution sequence (s°,p') in the reduced state space graph. 

Definition 9. For a SAS+ planning task, a reduction method is optimality- 
preserving if, for any solution sequence (s°,p) in the state space graph, there also 
exists a solution sequence (s ,p') in the reduced state space graph satisfying that p' 
has the same preference that p does. 

Definition 10. For a SAS+ planning task, a reduction method is action-pre- 
serving if, for any solution sequence (s°,p) in the state space graph, there also 
exists a solution sequence (s°,p') in the reduced state space graph satisfying that the 
actions in p' is a permutation of the actions in p. 

Clearly, being action-preserving is a sufficient condition for being completeness- 
preserving. When the preference is action set invariant, being action-preserving is 
also a sufficient condition for being optimality-preserving. 

3.2 Stubborn set theory for planning 

Although there are many variations of POR methods, a popular and representative 
POR algorithm is the stubborn set method [Valmari 1988; 1989; 1990; 1998; 1991; 
1993], used for model checking based on Petri nets. The basic idea is to form a 
stubborn set of applicable actions for each state and expand only the actions in the 
stubborn set during search. By expanding a small subset of applicable actions in 
each state, stubborn set methods can reduce the search space without compromising 
completeness. 
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This diagram plots the stubborn set condition Al in Definition 11. 



Fig. 1. Illustration of stubborn set. 

Since planning also examines a large search space, we propose to develop a stub- 
born set theory for planning. To achieve this, we need to handle various subtle 
issues arising from the differences between model checking and planning. We first 
define the concept of stubborn sets for planning, adapted from the concepts in 
model checking. 

Definition 11 Stubborn Set for Planning. For a SAS+ planning task, a 
set of actions T(s) is a stubborn set at state s if and only if 

Al) For any action b 6 T(s) and actions bi, ■ ■ ■ , b^ ^ T(s), i/ (&i, • • ■ , &fe, b) is a 
prefix of a path from s to a goal state, then (b, b±, ■ ■ ■ , &&) is a valid path from s 
and leads to the same state that (&!,-■■ , 6fc, b) does; and 

A2) Any valid path from s to a goal state contains at least one action in T(s). 

The above definition is schematically illustrated in Figure 1. Once we define the 
stubborn set T(s) at each state s, we in effect reduce the state space graph to a 
subgraph: only the edges corresponding to actions in the stubborn sets are kept in 
the subgraph. 

Definition 12. For a SAS+ planning task, given a stubborn set T(s) defined at 
each state s, the stubborn set method reduces its state space graph Q to a subgraph 
Q r such that V(Q r ) = V(G) and there is an edge (s, s') in E(Q r ) if and only if there 
exists an action o G T(s) such that s' = apply(s,o). 

A stubborn set method for planning is a reduction method that reduces the 
original state space graph Q to a subgraph Q r according to Definition 12. In other 
words, a stubborn set method expands actions only in a stubborn set in each state. 
In the sequel, we show that such a reduction method preserves actions, hence, it 
also preserves completeness and optimality. 

Theorem 1. Any stubborn set method for planning is action-preserving. 

Proof. Wc prove that for any solution sequence (s°,p) in the original state 
space graph Q, there exists a solution sequence (s°,p') in the reduced state space 
graph Gr resulting from the stubborn set method, such that p' is a permutation of 
actions in p. Wc prove this fact by induction on k, the length of p. 

When k = 1, let a be the only action in p, according to the second condition in 
Definition 12, a is in T(sq). Thus, (s°,p) is also a solution sequence in Q r . The EC 
method is action-preserving in the base case. 
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When k > 1, the induction assumption is that any path in Q with length less 
than or equal to k — 1 has a permutation in Q r that leads to the same final state. 
Now we consider a solution sequence (s°,p) in Q: p = (a±, . . . ,a^). Let s 1 = 
apply(s l , at), i = 1, . . . , k. If ai € T(s), we can invoke the induction assumption 
for the state s 1 and prove our induction assumption for k. 

We now consider the case where a\ ^ T(s). Let aj be the first action in p 
such that £ T(s). Such an action must exist because of the condition A2 in 
Definition 11. 

Consider the sequence p* = (aj, a\, • • • , o,j—i, aj+i, • • • , <Jfc)- According to condi- 
tion Al in Definition 12, (a,j,a\, • • • , aj_i) is also a valid sequence from sq which 
leads to the same state that (ai, • • • , aj) does. Hence, we know that (s°,p*) is also 
a solution path. Therefore, let s' = apply(s° , aj), we know (ai,--- , ctj-i) is an 
executable action sequence starting from s'. Let p** = (ai, • • • , dj—i, Oj+i, • • • , a*;), 
(s',p**) is a solution sequence in 5- From the induction assumption, we know 
there is a sequence p' which is a permutation of p** , such that (s',p') is a solution 
sequence in Q r . Since aj £ T(s°), we know that aj followed by p' is a solution 
sequence from s° and is a permutation of actions in p* , which is a permutation of 
actions in p. Thus, the stubborn set method is action-preserving. ■ 

Since being action-preserving is a sufficient condition for being completeness- 
preserving and optimality-prcscrving, when the preference is action set invariant, 
we have the following result. 

Corollary 1. A stubborn set method for planning is completeness-preserving. 
In addition, it is optimality-preserving when the preference is action set invariant. 

3.3 Left commutativity in SAS+ planning 

Note that although Theorem 1 provides an important result for reduction, it is not 
directly applicable since the conditions in Definition 11 are abstract and not directly 
implcmcntable in algorithms. We need to find sufficient conditions for Definition 11 
that can facilitate the design of reduction algorithms. In the following, we define 
several concepts that can lead to sufficient conditions for Definition 11. 

Definition 13 State-Dependent Left Commutativity. For a SAS+ plan- 
ning task, an ordered action pair (a, b), a, b £ O is left commutative in state s, if 
(a, b) is a valid path at s, and (b, a) is also a valid path at s and results in the same 
state. We denote such a relationship by s : b =>■ a. 

Definition 14 State-Independent Left Commutativity. For a SAS+ plan- 
ning task, an ordered action pair (a, 6), a, b £ O is left commutative if, for any state 
s, it is true that s : b => a. We denote such a relationship by b =>■ a. 

Note the following. 1) Left commutativity is not a symmetric relationship, b => a 
does not imply a => b. 2) The order in the notation b => a suggests that we should 
always try only (6, a) during the search instead of trying both (a, b) and (6, a). 
Also, not every state-independent left commutative action pair is state-dependent 
left commutative. For instance, in a SAS+ planning task with three state variables 
{x\, X2, £3}, action a with pre{a) = {x\ — 0}, eff(a) = {X2 = 1} and action b 
with pre(b) = {x2 = l,af3 = 2}, eff(b) = {x^ = 3} are left commutative in state 
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s 



bi b 2 b k b goal 

O — L -0 • • • O -Q — O 

bi b 2 b bv goal 

O — J—O =-0 • • • O O O 



>b k s bi b ' bv_i bv goal 

O — L -0 O - • • O iQ — O 

b l ^ ^ b k _i b k ^ goal 



S <3 =— O • • • Q - = Q — ^C> <3 

In this diagram, the left part plots the condition LI in Definition 15 and the right part plots the 

strategy in the proof to Theorem 2. 

Fig. 2. Illustration of left commutative set. 

Si = {xi = 0, xi = 1, .T3 = 3} but not in state S2 = {x\ = 0, X2 = 0, X3 = 2} as b is 
not applicable in state s 2 . 

We introduce state-independent left commutativity as it can be used to derive 
sufficient conditions for finding stubborn sets. 

Definition f 5 State-Independent Left Commutative Set. For a SAS+ 
planning task, a set of actions T(s) is a left commutative set at a state s if and 
only if 

LI) For any action b E T(s) and any action a G O — T(s), if there exists a valid 
path from s to a goal state that contains both a and b, then it is the case that 
b =£> a; and 

A2) Any valid path from s to a goal state contains at least one action in T(s). 

Theorem 2. For a SAS+ planning task, for a state s, if a set of actions T(s) 
is a state-independent left commutative set, it is also a stubborn set. 

Proof. We only need to prove that Lf in Definition 15 implies Al in Definition 
11. The proof strategy is schematically shown in Figure 2. 

For an action b £ T(s) and actions b±, ■ ■ ■ , bk ^ T(s), if (b±, ■ ■ • , bk, b) is a prefix 
of a path from s to a goal state, then according to LI, we see that b =>• bi, for 
i = 1, • • • , k. According to the definition of left commutativity, we see that bk and b 
can be swapped and that the resulting path (&!,••■ , b, bk) is still a valid path that 
leads to the same state that ■ • ■ , bk, b) does. We can subsequently swap b with 
bk-i, • ■ • , and b\ to obtain equivalent paths, before finally obtaining (6, b\, ■ • ■ , bk), 
as shown in the schematic illustration in the right part of Figure 2. Hence, we have 
shown that if p = (b\, ■ • • , bk, b) is a prefix of a path from s to a goal state, then 
p' = (b,b\, ■ ■ ■ ,bk) is a also valid path from s that leads to the same state that p 
does, which is exactly the condition Al in Definition 11. ■ 

From the above proof, we see that the requirement of state-independent left 
commutativity in Definition 15 is unnecessarily strong. Instead, only certain state- 
dependent left commutativity is necessary. In fact, when we change (6i, • • • ,bk,b) 
to ,b,bk), we only require s' : b => bk where s' is the state after bk-i is 

executed. Similarly, when we change (bi, ■ ■ ■ ,bk,b) to (£>i, • • ■ ,b, bk-i, bk), we only 
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require s" : b =>• bk-i where s" is the state after bk~2 is executed. Based on the 
above analysis, we can refine the sufficient conditions. 

Definition 16 State-Dependent Left Commutative Set. For a SAS+ 
planning task, a set of actions T(s) is a left commutative set at a state s if and 
only if 

LI ') For any action b 6 T(s) and actions b\, ■ ■ ■ ,bk ^ T(s), if (pi, ■ • ■ , b^, b) is a 
prefix of a path from s to a goal state, then s' : b bk , where s' is the state after 
(&!,••■ , bk-i) is executed; and 

A2) Any valid path from s to a goal state contains at least one action in T(s). 

We only need to slightly modify the proof to Theorem 2 in order to prove the 
following theorem. 

Theorem 3. For a SAS+ planning task, for a state s, if a set of actions T(s) 
is a state- dependent left commutative set, it is also a stubborn set. 

The above result gives sufficient conditions for finding stubborn sets in planning. 
The concept of state-dependent left commutative set requires a less stringent condi- 
tion than the state-independent left commutative set. Such a nuance actually leads 
to different previous POR algorithms with varying performances. Therefore, it will 
result in smaller T(s) sets and stronger reduction. Next, we present our algorithm 
for finding such a set at each state to satisfy these conditions. 

3.4 Determining left commutativity 

Theorem 3 provides a key result for POR. However, the conditions in Definition 13 
are still abstract and not directly implementable. The key issue is to efficiently find 
left commutative action pairs. Now we give necessary and sufficient conditions for 
Definition 13 that can practically determine left commutativity and facilitate the 
design of reduction algorithms. 

Theorem 4. For a SAS+ planning task, for a valid action path (a, b) in state 
s, we have s : b =>■ a if and only if pre(a) and eff(b), pre(b) and eff(a), eff(a) and 
eff(b) are all conflict-free and b is applicable at s. 

Proof. First, from the definition of s : b =>• a, we know that action b is ap- 
plicable in state apply(s,a). This implies that pre(b) and eff(a) are conflict-free. 
Symmetrically, since action a is applicable in state apply (s, b), pre(a) and eff(b) are 
also conflict-free. Now we prove eff(a) and eff(b) are conflict-free by contradiction. 
If eff(a) and eff(b) are not conflict-free, without loss of generality, we can assume 
that eff(a) contains Xi = i>j and eff(b) contains Xi = v[ ^ Vi. Thus, the value of Xi 
is Vi for state s a b = apply (apply (s, a), b) and v[ for state Sba = apply (apply (s, b),a), 
i.e., Sab is different than Sb a - This contradicts our assumption that a and b are left 
commutative. Thus, eff(a) and eff(b) are conflict-free. 

Second, if pre(a) and eff(b), eff(a) and pre(b), eff(a) and eff(b) are all conflict- 
free, since a is applicable in s, a is also applicable in state apply(s, b) as pre(a) and 
eff(b) arc conflict-free. Hence, (b, a) is a valid path at s. Also, for any state variable 
Xi, its value in states s a b = apply (apply (s, a), b) and s& a = apply (apply (s, b), a) are 
the same, because eff(a) and eff(b) are conflict-free. Therefore, we have s a & = s oa . 
Hence, we have s : b a. M 
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s=(sl, s2, s3, s4) Potential Dependency Graph PDG(v) 

Fig. 3. A SAS+ task with four DTGs. The dashed arrows show preconditions (prevailing 
and transitional) of each edge (action). Actions are marked with letters a to f. We see 
that b and e are associated with more than one DTG. 

Theorem 4 gives necessary and sufficient conditions for deciding whether two 
actions are left-commutative or not. Based on this result, we later develop practical 
POR algorithms that find stubborn sets using left commutativity. 

4. EXPLANATION OF PREVIOUS POR ALGORITHMS 

Previously, we have proposed two POR algorithms for planning: expansion core 
(EC) [Chen and Yao 2009] and stratified planning (SP) [Chen ct al. 2009], both of 
which showed good performance in reducing the search space. However we did not 
have a unified explanation for them. We now explain how these two algorithms can 
be explained by our theory. Full details of the two algorithms can be found in our 
papers [Chen and Yao 2009; Chen et al. 2009]. 

4.1 Explanation of EC 

Expansion core (EC) algorithm is a POR-based reduction algorithm for planning. 
We will see that, in essence, the EC algorithm exploits the SAS+ formalism to find 
a left commutative set for each state. To describe the EC algorithm, we need the 
following definitions. 

Definition 17. For a SAS+ task, for each DTG Gi,i = 1, . . . ,N, for a vertex 
v 6 V{G{), an edge e £ E(Gi) is a potential descendant edge of v (denoted as 
v <3 e) if 1) Gi is goal-related and there exists a path from v to the goal state in Gi 
that contains e; or 2) Gi is not goal-related and e is reachable from v. 

Definition 18. For a SAS+ task, for each DTG G^i = 1, . . . , N, for a vertex 
v e V(Gi), a vertex w <E V(Gi) is a potential descendant vertex of v (denoted 
as v <a w) if 1) Gi is goal-related and there exists a path from v to the goal state in 
Gi that contains w; or 2) Gi is not goal-related and w is reachable from v. 

Definition 19. For a SAS+ task, given a state s = (si, • • ■ ,sjv), for any 1 < 
hj < N,i 7^ j, we call Sj a potential precondition of the DTG Gj if there exist 
o G O and ej € E(Gj) such that 

Sj <a e,j, oh ej, and Si £ pre{o) (1) 
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Definition 20. For a SAS+ task, given a state s = (s±,...,sn), for any 1 < 
ij j < N,i 7^ j, we call Si a potential dependent of the DTG Gj if there exists 
o G O. ei = (si,s'i) G E(Gi) and Wj G V(Gj) such that 

Sj <Wj, oh ei, and Wj G pre(o) (2) 

Definition 21. For a SAS+ task, for a state s = (si, . . . , s^), its poten- 
tial dependency graph PDG(s) is a directed graph in which each DTG Gi,i = 
1, • • • ,N corresponds to a vertex, and there is an edge from Gi to Gj, i ^= j , if and 
only if Si is a potential precondition or potential dependent of Gj . 

Figure 3 illustrates the above definitions. In PDG(s), G\ points to Gi as s\ is a 
potential precondition of G2 and Gi points to G\ as S2 is a potential dependent of 
G x . 

Definition 22. For a directed graph H , a subset C ofV(H) is a dependency 
closure if there do not exist v G C and w G V(H) — C such that (i>, w) G E(H). 

Intuitively, a DTG in a dependency closure may depend on other DTGs in the 
closure but not those DTGs outside of the closure. In Figure 3, G± and G2 form a 
dependency closure of PDG(s). 

The EC algorithm is defined as follows: 

Definition 23 Expansion Core Algorithm. For a SAS+ planning task, the 
EC method reduces its state space graph Q to a subgraph Q r such that V(Q r ) = V(Q) 
and for each vertex (state) s G V{Q), it expands actions in the following set T(s) C 
O: 



T(s)= [j < o o G exec(s) A o h G t I, 



ieC(s) 



(3) 



where exec(s) is the set of executable actions in s and C(s) C {1, ••• ,N} is an 
index set satisfying: 

EC1) The DTGs {Gi,i G C(s)} form a dependency closure in PDG(s); and 

EC2) There exists i G C(s) such that Gi is goal-related and Sj is not the goal state 
in Gi. 

Intuitively, the EC method can be described as follows. To reduce the original 
state-space graph, for each state, instead of expanding actions in all the DTGs, 
it only expands actions in DTGs that belong to a dependency closure of PDG(s) 
under the condition that at least one DTG in the dependency closure is goal-related 
and not at a goal state. 

The set C(s) can always be found for any non-goal state s since PDG(s) itself is 
always such a dependency closure. If there is more than one such closure, theoret- 
ically any dependency closure satisfying the above conditions can be used in EC. 
In practice, when there are multiple such dependency closures, EC picks the one 
with less actions in order to get stronger reduction. EC has adopted the following 
scheme to find the dependency closure for any state s. 

Given a PDG(s), EC first finds its strongly connected components (SCCs). If 
each SCC is contracted to a single vertex, the resulting graph is a directed acyclic 
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graph S. Note that each vertex in S with a zero out-degree corresponds to a 
dependency closure. It then topologically sorts all the vertices in S to get a sequence 
of SCCs: Si, 5*2, ■ ■ • , and picks the minimum m such that S m includes a goal-related 
DTG that is not in its goal state. It chooses all the DTGs in Si,-- - , S m as the 
dependency closure. 

Now wc explain the EC algorithm using the POR theory we developed in Sec- 
tion 3. We show that the EC algorithm can be viewed as an algorithm for finding 
a state-dependent left-commutative set in each state. 

Lemma 1. For a SAS+ planning task, the EC algorithm defines a state- dependent 
left commutative set for each state. 

Proof. Consider the set of actions T(s) expanded by the EC algorithm in each 
state s, as defined in (3). We prove that T(s) satisfies conditions LI' and A2 in 
Definition 16. 

Consider an action b G T(s) and actions bi, ■ ■ ■ , bk T(s) such that (bi, ■ ■ ■ ,bk, b) 
is a prefix of a path from s to a goal state, we show that s' : b bk, where s' is the 
state after (bi, ■ ■ ■ , b^-i) is applied to s. 

Let C(s) be the index set of the DTGs that form a dependency closure, as used 
in in (3). Since b G T(s), there must exist m G C(s) such that b h G m . Let the 
state after applying (pi,-- - to s be s*. We see that we must have s* m = s rn 
because otherwise there must exist a bj, 1 < j < m that changes the assignment of 
state variable x m . However, that would imply that bk G T(s). Since b is applicable 
in s* , we see that s m = s* n G pre(b). 

If there exists a state variable Xi such that an assignment to Xi is in both eff(bk) 
and pre{b), then G m will point to the DTG G; potential dependent of 

Gi, forcing G; to be included in the dependency closure, i.e. i G C(s). However, 
as bk l~ Gi, it will violate our assumption that bk ^ T(s). Hence, none of the 
precondition assignments of b is added by bk- Therefore, since b is applicable in 
apply(s / ,bk), it is also applicable in s'. 

On the other hand, if bk has a precondition assignment in a DTG that b is 
associated with, then G m will point to that DTG since s m is a potential precondition 
of bk, forcing that DTG to be in C(s), which contradicts the assumption that 
bk tfz T(s). Hence, b does not alter any precondition assignment of bk- Therefore, 
since bk is applicable in s' , it is also applicable in the state apply (s' , b). 

Finally, if there exists a state variable Xi such that an assignment to Xi is altered 
by both b and bk, then we know b h Gi and bk h G;. In this case, G m will point to 
Gi since s m is a potential precondition of Gi, making bk G T(s), which contradicts 
our assumption. Hence, eff(b) and eff(bk) correspond to assignments to distinct 
sets of state variables. Therefore, applying (bk,b) and (b,bk) to s' will lead to the 
same state. 

From the above, we see that b is applicable in s', bk is applicable in apply {s 1 , b), 
and hence (b,bk) is applicable in s'. Further we see that (b,bk) leads to the same 
state as (bk,b) does when applied to s'. We conclude that s' : b => bk and T(s) 
satisfies LI'. 

Moreover, for any goal-related DTG Gi, if in a state s, its assignment Sj is not the 
goal state in Gi, then some actions associated with Gi have to be executed in any 
solution path from s. Since T(s) includes all the actions in at least one goal-related 
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DTG Gi, any solution path must contain at least one action in T(s). Therefore, 
T(s) also satisfies A2 and it is indeed a state-dependent left commutative set. ■ 

From Lemma 1 and Theorem 3, we obtain the following result, which shows that 
EC fits our framework as a stubborn set method for planning. 

Theorem 5. For any SAS+ planning task, the EC algorithm defines a stubborn 
set in each state. 

4.2 Explanation of SP 

The stratified planning (SP) algorithm exploits commutativity of actions directly [Chen 
ct al. 2009]. To describe the SP algorithm, we need the following definitions first. 

Definition 24. Given a SAS+ planning task II with state variable set X , the 
causal graph (CG) is a directed graph CG(TL) = (X,E) with X as the vertex set. 
There is an edge (x, x') € E if and only if x ^ x' and there exists an action o such 
that x S eff{o) and x' £ pre (6) or eff(o). 

Definition 25. For a SAS+ taskH, a stratification of the causal graph CG(H) 
as (X,E) is a partition of the node set X: X = (X\, ■ ■ ■ ,Xk) in such a way that 
there exists no edge e = (x,y) where x € Xi,y £ Xj and i > j. 

By stratification, each state variable is assigned a level L(x), where L(x) = i if 
x E Xi, 1 < i < k. Subsequently, each action o is assigned a level L(o), 1 < L(o) < 
k. L(o) is the level of the state variable(s) in eff(o). Note that all state variables 
in the same eff(o) must be in the same level, hence, our L{p) is well-defined. 

Definition 26 Follow-up Action. For a SAS+ taskU, an action b is a follow- 
up action of a (denoted as a > b) if eff(a) C\pre(b) ^ or eff(a) n eff(b) ^ 0. 

The SP algorithm can be combined with standard search algorithms, such as 
breadth- first search, depth- first search, and best- first search (including A*). During 
the search, for each state s that is going to be expanded, the SP algorithm examines 
the action a that leads to s. Then, for each applicable action b in state S, SP makes 
the following decisions. 

Definition 27 Stratified Planning Algorithm. For a SAS+ planning task, 
in any non-initial state s, assuming a is the action that leads directly to s, and b is 
an applicable action in s, then SP does not expand b if L(b) < L[a) and b is not a 
follow-up action of a. Otherwise, SP expands b. In the initial state sq, SP expands 
all applicable actions. 

The following result shows the relationship between the SP algorithm and our 
new POR theory. 

Lemma 2. If an action b is not SP-expandable after a, and state s is the state 
before action a, then s : b => a. 

Proof. Since b is not SP-expandable after a, following the SP algorithm, we 
have L(a) > L(b) and b is not a follow-up action of a. According to Definition 26, 
we have eff(a) Pipre(b) = eff(a) D eff(b) = 0. These imply that eff(a) and pre(b) are 
conflict-free, and that eff(a) and eff(b) are conflict-free. Also, since b is applicable 
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in apply(s, a) and eff(a) and pre(b) are conflict-free, b must be applicable in s 
(Otherwise eff(a) must change the value of at least one variable in pre(b), which 
means eff{a) and pre (b) are not conflict-free). 

Now we prove that pre(a) and eff(b) arc conflict-free by showing pre(a) PI eff(b) = 
0. If their intersection is non-empty, we assume a state variable x is assigned by both 
pre(a) and eff(b). By the definition of stratification, x is in layer L(b). However, 
since x is assigned by pre(a), there must be an edge from layer L(a) to layer 
L(x) = L(b) since L(a) ^ L(b). In this case, we know that L(a) < L(b) from the 
definition of stratification. Nevertheless, this contradicts with the assumption that 
L(a) > L(b). Thus, pre(a) n eff(b) = 0, and pre(a) and eff(b) are conflict-free. 

With all three conflict-free pairs, we have s : b a according to Theorem 2. ■ 

Although SP reduces the search space by avoiding the expansion of certain ac- 
tions, it is in fact not a stubborn set based reduction algorithm. We have the 
following theorem for the SP algorithm. 

Definition 28. For a SAS+ planning task S, a valid path p a = (ai,--- ,a„) 
is an SP-path if and only if p a is a path in the search space of the SP algorithm 
applied to S. 

Theorem 6. For a SAS+ planning task S , for any initial s and any valid path 
Pa = (<ii, ■ • • i a n ) from sq, there exists a path p = (pi, ■ ■ ■ , b n ) from Sq such that 
Pb is an SP-path, and both p a and p lead to the same state from so, and Pb is a 
permutation of actions in p a ■ 

Proof. We prove by induction on the number of actions. 

When n = I, since there is no action before sq, any valid path (ai) will also be 
a valid path in the search space of the SP algorithm. 

Now we assume this proposition is true of for n = k, k > 1 and prove the 
case when n = k + 1. For a valid path p° = (oi, • • • , ttfe, au+i), by our induction 
hypothesis, we can rearrange the first k actions to obtain a path (a\, a\, ■ ■ ■ , a\). 

Now we consider a new path p 1 = (a{, • • • , a\, <Xfc+i). There are two cases. First, 
if L(ak+i) < L(ai), or L(a,k+i) > L(a\) and a^ + i is a follow-up action of a\, then 
p 1 is already an SP-path. Otherwise, we have L(ak+i) > L(a\) and ak+i is not a 
follow-up action of a\. In this case, by Lemma 2, path p 1 = (a\, • • • , aj^, <Xfc+i, a\. 
is also a valid path that leads s to the same state as p a does. 

By the induction hypothesis, if p 1 is still not an SP-path, we can rearrange the 
first k actions in p 1 to get a new path p 2 = (af, • • • , a|,aj.). Otherwise we let 
p 2 = p 1 . Comparing p 1 and p 2 , we know L(ak+i) > L(a\), namely, the level value 
of the last action in p 1 is strictly larger than that in p 2 . We can repeat the above 
process to generate p 3 , • • ■ ,p m , ■ ■ ■ as long as p*(j G Z + ) is not an SP-path. Our 
transformation from p J to p- J+1 also ensures that every pi is a valid path from s 
and leads to the same state that p a docs. 

Since we know that the layer value of the last action in each pj is monotonically 
decreasing as j increases, such a process must stop after a finite number of iterations. 
Suppose it finally stops at p m = (a[, a' 2 , ■ ■ • , a' k , a' k+1 , we must have that L(a' k+1 ) < 
L(a' k ) or L(a' k+1 ) > L(a' k ) and a' k+1 is a follow-up action of aw- Hence, p m now is 
an SP-path. We then assign p m to pb and the induction step is proved. ■ 
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Theorem 6 shows that the SP algorithm cannot reduce the number of states 
expanded in the search space. The reason is as follows: for any state in the original 
search space that is reachable from the initial state so via a path p, there is still 
an SP-path that reaches s. Therefore, every reachable state in the search space 
is still reachable by the SP algorithm. In other words, SP reduces the number of 
generated states, but not the number of expanded states. 

SP is not a stubborn set based reduction algorithm. This can be illustrated by 
the following example. 

Assuming a SAS+ planning task S that contains two state variables x\ and X2, 
where both x\ and xi have domain {0, 1}, with the initial state as \x\ = 0,2:2 = 0} 
and the goal as \x\ = 1, x<i = 1}. Actions a and b are two actions in S where pre(a) 
is {xi = 0} and eff(a) is {x\ = 1} and pre(b) is {22 = 0} and eff(b) is {x 2 = 1}. It 
is easy to see that a and b are not follow-up actions of each other, and that x\,x 2 
will be in different layers after stratification. Without loss of generality, we can 
assume L(a) = L(x±) > L(x2) = L(b). Therefore, we know that action b will not 
be expanded after action a in state s : {x\ = 1, xi = 0}. However, apply(s, b) is the 
goal. Not expanding b in state s violates condition A2 in Definition 11 where any 
valid path from s to a goal state has to contain at least one action in the expansion 
set of s. 

We can also see in the above example that the search space explored by SP 
contains four states, namely, the initial state sq, apply(so,a), apply(so,b) and the 
goal state. Meanwhile, under the EC algorithm, in state sq, the DTGs for x\ and 
X2 are not in each other's dependency closures. This implies that in sq, EC expands 
either action a or b, but not both. Therefore, EC expands three states while SP 
expands four. This illustrates our conclusion in Theorem 6 that the SP algorithm 
cannot reduce the number of expanded states. 

5. A NEW POR ALGORITHM FRAMEWORK FOR PLANNING 

We have developed a POR theory for planning and explained two previous POR 
algorithms using the theory. Now, based on the theory, we propose a new POR 
algorithm which is stronger than the previous EC algorithm. 

Our theory shows in Theorem 3 that the condition for enabling POR reduction is 
strongly related to left commutativity of actions. In fact, constructing a stubborn 
set can be reduced to finding a left commutativity set. As we show in Theorem 5, 
the EC algorithm follows this idea. However, the basic unit of reduction in EC is 
DTG (i.e. either all actions in a DTG are expanded or none of them are), which 
is not necessary according to our theory. Based on this insight, we propose a new 
algorithm that operates with the granularity of actions instead of DTGs. 

Definition 29. For a state s, an action set L is a landmark action set if 
and only if any valid path starting from s to a goal state contains at least one action 
in L. 

Definition 30. For a SAS+ task, an action a S O is supported by an action 
b if and only if pre(a) H eff(b) 0. 

Definition 31. For a state s, its action support graph (ASG) at s is defined 
as a directed graph in which each vertex is an action, and there is an edge from a 
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to b if and only if a is not applicable in s and a is supported by b. 

The above definition of ASG is a direct extension of the definition of a causal 
graph. Instead of having domains as basic units, here we directly use actions as 
basic units. 

Definition 32. For an action a and a state s, the action core of a at s, 
denoted by AC s (a), is the set of actions that are in the transitive closure of a in 
ASG(s). The action core for a given set of actions A is the union of action cores 
of every action in A. 

Lemma 3. For a state s, if an action a is not applicable in s and there is a valid 
path p starting from s whose last action is a, then p contains an action b,b ^ a, 6 G 
AC s (a). 

Proof. We prove this by induction on the length of p. 

In the base case where \p\ = 2, we assume p — (&, a). Since a is not applicable in 
s, it must be supported by b. Thus, b G AC s {a). Suppose this lemma is true for 
2 < \p\ < k — 1, we prove the case for \p\ — k. For a valid path p = (oi, . . . , oj,), 
again there exists an action b before a that supports a. If b is applicable in s, then 
b G AC s (a). Otherwise, we have a path p' = (oi, . . . , b) with 2 < \p'\ < k— 1. Thus, 
by the induction assumption, p' contains at least one action in AC s (b), which is a 
subset of AC s (a), according to Definition 31 and 32. ■ 

Definition 33. Given a SAS+ planning task II with O as set of all actions O, 
for a state s and a set of action A, the action closure of action set A at s 7 denoted 
by by C S (A), is a subset of O and a super set of A such that for any applicable action 
a G C S (A) at s and any action b G 0\C s (A), eff(a) and eff(b) are conflict-free. In 
addition, if pre (b) G S, eff(a) and pre(b) are conflict-free. 

Intuitively, actions in C s (A) can be executed without affecting the completeness 
and optimality of search. Specifically, because any applicable action in C s (A) and 
any action not in C S (A) will not assign different values to the same state variable, 
for action a G C S (A) and action b G 0\C s (A) at s, path (a, 6) will lead to the same 
state that (b, a) does. Additionally, because pre(b) and eff(a) are conflict-free 
when pre(b) G s, executing action a will not affect the applicability of action b in 
future. Therefore, actions in C S {A) can be safely expanded first during the search, 
while actions outside it can be expanded later. 

A simple procedure, shown in Algorithm 1, can be used to find the action closure 
for a given action set A. 

The proposed POR algorithm, called stubborn action core (SAC), works as fol- 
lows. At any given state s, the expansion set E(s) of state s is determined by 
Algorithm 2. 

There are various ways to find a landmark action set for a given state. Here we 
give one example that is used in our current implementation. To find a landmark 
action set L at s, we utilize the DTGs associated with the SAS+ formalism. We first 
find a transition set that includes all possible transitions (si,Vi) in an unachieved 
goal-related DTG Gi where Si is the current state of Gi in s. It is easy to see that 
all actions that mark transitions in this set make up a landmark action set, because 
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input : A SAS+ task with action set O, an action set ACQ, and a state s 
output: An action closure C(A) of A 

C{A) «- A; 
repeat 

foreach action a in C{A) applicable in s do 
foreach action b in 0\C(A) do 

if pre(b) fl s 7^ and pre(b) and eff(a) are not conflict-free then 
I C(A) <- C(A) U {&} ; 
end 

if ejff(&) and eff(a) are not conflict-free then 
I C(A) <- C*(A) U {&} ; 
end 
end 
end 

until C(A) is noi changing; 
return C(A) ; 

Algorithm 1: A procedure to find action closure 

input : A SAS+ planning task and state s 
output: The expansion set E(s) 

Find a landmark action set L at s ; 

Calculate the action core AC S (L) of £ using Algorithm 1; 
Use AC S (L) as E(s) ; 

Algorithm 2: The SAC algorithm 

d is unachieved and at least one action starting from s,; has to be performed in 
any solution plan. 

There are also other ways to find a landmark action set. For instance, the pre- 
processor in the LAMA planner [Richter et al. 2008] can be used to find landmark 
facts, and all actions that lead to these landmark facts also make up a landmark 
action set. 

Theorem 7. For a state s, the expansion set E(s) defined by the SAC algorithm 
is a stubborn set at s. 

Proof. We first prove that our expansion set E(s) satisfies condition Al in 
Definition If, namely, for any action b 6 E(S), and actions b\,--- ,bk ^ E(s), if 
(61, • • • , bk, b) is a valid path from s, then (6, 61, • • • , bk) is also a valid path, and 
leads to the same state that (61, ■ • • ,bk,b) does. 

To simplify this proof, we can treat action sequence (61, • • • , bk) as a "macro" 
action B where an assignment Xt = v t in pre(B) if and only if xt = ft is in the 
precondition of some bi £ B and x t = v t is not in the effects of a previous action 
bj(j < i), and an assignment x t — v t is in eff(B) if and only if xt = i>t is in the 
effect set of some bi £ B, and Xt is not assigned to any value other than vt in the 
effects of later action bj(j > i). In the following proof, we use the macro action B 
in place of the path (b±, • • • , bk)- 
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To prove Al, we only need to prove that if (B, b) is a valid path, then s : b =>■ B. 
According to Theorem 4, s : b =>■ B if and only if the following four propositions 
are true. 

a) Action b must be applicable in s. We prove this by contradiction. Let s' = 
apply{s, B), if b is not applicable in s, but applicable in s\ then B supports b. 
Since all effects of B are from actions in the path (61 , • • ■ ,bk), there exists an action 
bi € {&i, • • • ,6fc} such that bi supports b. However, according to Definition 32, bi 
is in the transitive closure of b in ASG(s). According to our algorithm, bi should 
be in E(s). This contradicts with our assumption that bi £ E(s). Thus, b must be 
applicable at s. 

b) pre(B) and eff(b) are conflict-free. We prove this proposition by contradiction. 
If pre(B) and eff(b) are not conflict-free, we assume that pre(B) has Xt = vt that 
conflicts with an assignment in eff(b). According to the way we define B, there 
exists an action bi £ (61, • • • , such that Xt = v t . Also, since B is applicable in 
s, wc know that xt takes the value vt at s also. Therefore, we know that pre(bi) and 
eff(b) are not conflict-free. However, according to Definition 33 and Algorithm 1, 
bi is in E(s). This contradicts with our assumption that bi is not in E(s). Thus, 
pre(B) and eff{b) are conflict-free. 

c) eff(B) and eff(b) are conflict-free. The proof of this proposition is very similar 
to the one above. If they are not conflict-free, we must have action 6, ; £ (61, • • ■ , 
such that eff(6) and eff(bi) are not conflict-free. However, according to Definition 33 
and Algorithm 1, bi is in E(s). This contradicts with our assumption that bi is not 
in E(s). Thus, eff(B) and eff(b) are conflict-free. 

d) pre(b) and eff(B) arc conflict-free. This proposition is true as we assumed in 
condition Al that (£?, &) is a valid path from s. 

Thus, from Theorem 4, we see that s : b B and that condition Al in Defini- 
tion 11 is true. 

Now we verify condition Al by showing that any solution path p from s contains 
at least one action in E(s). From the definition of landmark action sets, we know 
that there exists an action I 6 L such that p contains I. From Lemma 3 we know 
that AC s (l) contains at least one action, applicable in s, in p. Thus, E(s) indeed 
contains at least one action in p. 

Since E(s) satisfies conditions Al and A2 in Definition 11, E(s) is a stubborn 
set in state s. 

■ 

5.1 SAC vs. EC 

SAC gives stronger reduction than the previous EC algorithm, since it is based on 
actions, which have a finer granularity than DTGs do. Specifically, SAC gives more 
reduction than EC for two reasons. First, applicable actions that are not associated 
with landmark transitions, even if they are in the same DTG, are expanded by EC 
but not by SAC. Second, applicable actions that do not support any actions in the 
landmark action set, even if they are in the same DTG, are expanded by EC but 
not by SAC. 

To give an example, in Figure 4a, Gl, G2, G3 are three DTGs. The goal assign- 
ment is marked as an unfilled circle in Gl. a,b,c,d,e are actions. Dashed arrows 
denote the preconditions of actions. For instance, the lower dashed arrow means 
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a) A SAS+ task c) Search space of SAC 



Fig. 4. Search spaces of EC and SAC 
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Fig. 5. System Architecture of FD and SAC 

that b requires a precondition X3 = w. 

In this example, according to EC, Gl is a goal DTG and G2 and G3 are in the 
dependency closure of Gl. Thus, before executing a, EC expands every applicable 
action in Gl, G2 and G3 at any state. SAC, on the other hand, starts with a 
singleton set {a} as the initial landmark action set and ignores action e. Applicable 
action c is also not included in the action closure in state s since it does not support 
a. The search graphs are compared in Figure 4 and we see that SAC gives stronger 
reduction. 

6. SYSTEM IMPLEMENTATION 

We adopt the Fast Downward (FD) planning system [Hclmcrt 2006] as our code 
base. The overall architecture of FD is described in Figure 5. A complete FD 
system contains three parts corresponding to three phases in execution: translation, 
knowledge compilation and search. Translation module will convert planning tasks 
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described into a SAS+ planning task. The knowledge compilation module will 
generate domain transition graphs and causal graph for the SAS+ planning task. 
The search module implements various state-space-search algorithms as well as 
heuristic functions. All these three modules communicate by temporary files. 

We make two additions to the above system to implement our SAC planning 
system, as shown in Figure 5. First, we add a "commutativity analysis" module into 
the knowledge compilation step to identify commutativity between actions. Second, 
we add a "space reduction" module to the search module to conduct state space 
reduction. The commutativity analysis module is used to build left commutativity 
relations between actions and build the action support graph. It reads action 
information from the output of knowledge compilation module and determines the 
left commutativity relations between actions according to conditions in Theorem 3. 
In addition, this module also determines if one action is supported by another and 
builds the action support graph defined in Definition 31. The reduction module 
for search is used to generate a stubborn set of a given state. We implement the 
SAC algorithm in this module. Starting from a landmark action set L as the target 
action set, we find the action closure AC S (L) iteratively add actions that support 
actions in the target action set to the target action set until it is not changing. We 
then use the applicable actions in the action closure as the set of actions to expand 
at s. In other words, in our SAC system, during the search, for any given state s, 
instead of using successor generator provided by FD to generate a set of applicable 
operators, we use the reduction module to generate a stubborn set in state s and 
use it as the expansion set. 

It is easy to see that the overall time complexity of determining left commuta- 
tivity relationships between actions is 0(|A| 2 ) where \A\ is the number of actions. 
We implement this module in Python. Since the number of actions \A\ is usually 
not large, in most of the cases, the commutativity analysis module takes less than 
1 second to finish. This module only runs once for solving a planning problem. 
Therefore, the commutativity analysis module amounts to an insignificant amount 
of overhead to the system. Theoretically, the worst case time complexity for finding 
the action closure is 0(|A| 2 ) where \A\ is the number of actions. However, in prac- 
tice, by choosing the landmark action set L that associated with transitions in an 
unarchived goal-related DTG starting from current state, the procedure of finding 
action closure terminates quickly after about 4 to 5 iterations. Therefore, adding 
the reduction module does not increase the overall search overhead significantly ei- 
ther. We implement this module in C++ and incorporate it into the search module 
of FD. 

7. EXPERIMENTAL RESULTS 

We test our algorithm on problems in the recent International Planning Competi- 
tions (IPCs): IPC4, and IPC5. We implemented our algorithm on top of the Fast 
Downward (FD) planner [Helmert 2006]. We only modified the state expansion 
part. 

We have implemented our SAC algorithm and tested it along with Fast Downward 
and its combination with the EC extension on a Red Hat Linux server with 2Gb 
memory and one 2.0GHz CPU. The admissible HSP h max heuristic [Bonet and 
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Gcffncr 2001] and inadmissible Fast Forward (FF) heuristic [Hoffmann and Ncbcl 
2001] are used in our experiments. 

First, we apply our SAC algorithm to A* search with the HSP h max heuris- 
tic [Bonct and Gcffncr 2001]. We also turn off the option of preferred opera- 
tors [Hclmcrt 2006] since it compromises the optimality of A* search. Table I 
shows the detailed results on node expansion and generation during the search. Wc 
also compare the solving times of these three algorithms. As we can clearly see from 
Table I, the numbers of expanded nodes of the SAC-enhanced A* algorithm are con- 
sistently lower than those of the baseline A* algorithm and the EC-enhanced A* 
algorithm. There are some cases where the generated nodes of the SAC-enhanced 
algorithm are slightly larger than those of the baseline A* or EC-enhanced A* algo- 
rithm. This is possible due to the tie-breaking of states with equal heuristic values 
during search. 

We can also see that the computational overhead of SAC is low. For instance, 
in the Freecell domain, the running time of the SAC-enhanced algorithm is only 
slightly higher than the baseline and lower than the EC-enhanced algorithm, despite 
their equal number of expanded and generated nodes. 

Aside from the A* algorithm, we also test SAC on best-first search algorithms. 
Although POR preserves completeness and optimality, it can also be combined with 
suboptimal searches such as best-first search to reduce their search space. In this 
comparison, we turned off the option of preferred operators in our experiment for 
FD. Preferred operator is another space reduction method that does not preserve 
completeness, and using it with EC or SAC will lead to worse performance. Wc 
will investigate how to find synergy between these two approaches in our future 
work. We summarize the performance of three algorithms, original Fast Downward 
(FD), FD with EC, and FD with SAC, in Table II by presenting the number of 
problem instances in a planning domain that can be solved within 1800 seconds by 
each solver. We also ignore small problem instances with solving time less than 0.01 
seconds. All there solvers uses inadmissible Fast Forward (FF) heuristic. As we can 
see from Table II, when combined with a best-first-search algorithm, SAC can still 
reduce the number of generated and expanded nodes compared to the baseline FD 
algorithm and the EC-enhanced algorithm. In many problems (e.g. pipesworldl8, 
tppl5, trucklS), the saving on the number of expanded states can be of orders of 
magnitude. 

8. RELATED WORK 

We discuss some related work in this section. 
8.1 Symmetry 

Symmetry detection is another way for reducing the search space [Fox and Long 
1999]. From the view of node expansion on a SAS+ formalism for planning, we can 
see that symmetry removal is different from SAC. For example, consider a domain 
with three objects Ax, A2, and £?, where A\ and Ai arc symmetric, and actions 
associated with B have no conflict with any actions associated with A\ or Ai- In 
this case, symmetry removal will expand actions associated with (A\ and B) or 
(A2 and B), whereas SAC will only expand actions associated with the DTG for 
B if we pick a landmark action set based on it. This is because both the action 
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core and the action closure will not include any actions associated with A\ or A2 
as they have no conflict with actions in B. 

Intuitively, symmetric removal finds that it is not important whether A\ or A 2 is 
used since they are symmetric, whereas SAC finds that it is not important whether 
actions associated with B is used before or after actions associated with Ai, i = 1,2 
since there is no conflict. In fact, SAC can also detect stronger relationships such 
as the fact that it is safe to use actions associated with B before those associated 
with Ai, since any path that uses actions associated with Ai before those associated 
with B corresponds to another valid path with the same cost. 

Further, there is limited research on domain-independent detection and removal 
of symmetry. The method by Fox and Long [Fox and Long 1999] detects symmetry 
from the specification of initial and goal states and may miss many symmetries. 



8.2 Factored planning 

Factored planning [Amir and Engelhardt 2003; Brafman and Domshlak 2006; Kc- 
lareva et al. 2007] is a class of search algorithms that exploits the decomposition of 
state space. In essence, factored planning finds all the subplans for each individual 
subgraph and tries to merge them. There are some limitations of factored planning. 

First, for some problems with dense subgraphs, the number of subplans in each 
subgraph may be very large, making the search very expensive. What is worse is 
that there are many subgraphs in which the goal is not specified, leading to more 
subplans that need to be considered. We have done some empirical study on this 
matter. For example, for pipesworld20, there are 96 DTGs, 18 out of which have 
goal facts. Even if we only consider one DTG and apply the canonicality assumption 
that each state can be included at most once in any subplan, the number of subplans 
from the initial state to the goal can be as high as 1.96x 10 9 in DTG #16 generated 
by Fast Downward. The number is high because there are multiple transition 
paths, and each transition can be associated with many actions. If we multiply the 
numbers of possible subplans of the 18 DTGs containing goals, the number will 
approximately be of the order of 10 120 . Thus, the search space will be extremely 
large if we consider all the 96 DTGs (78 of which do not even have a goal state) and 
remove the canonicality assumption. Of course, techniques such as tree search and 
pruning [Kelareva et al. 2007] can speed up the process but the potential speedup 
is largely unknown. 

Second, since the canonicality assumption is generally not true for many domains, 
and there are potentially infinite number of subplans without restriction on the 
subplan length, the factored planning algorithm needs to use certain schemes such 
as iterative deepening [Brafman and Domshlak 2006] to restrict the subplan length. 
These schemes further increase the complexity and may compromise the global 
optimality of the resulting plan [Kelareva et al. 2007] . 

In summary, although factored planning has shown potential on some domain- 
dependent studies, its practicality for general domain-independent planning has not 
been established yet. We note that POR algorithms we studied in this paper are 
not exclusive to factored planning and it is possible that POR can be integrated 
into factored planning to reduce the cost of search. 
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8.3 Planning utilizing the SAS+ decomposition 

Our POR method is based on the SAS+ representation [Hclmert 2006]. Recently 
there has been increasing interest in utilizing the SAS+ representation. 

The Fast Downward planner [Helmert 2006] develops its heuristic function by 
analyzing the causal graphs on top of the SAS+ models. Another SAS+ based 
heuristic for optimal planning is recently obtained via a linear programming model 
encoding DTGs [van den Briel et al. 2007]. The LAMA planner derives inad- 
missible heuristic values by analyzing landmarks in SAS+ models [Richter et al. 
2008]. An admissible version of it is proposed in [Karpas and Domshlak 2009] 
by using action cost partitioning. Yet another admissible heuristic called 'mcrge- 
and-shrink' is developed based on abstraction of domain transitions [Helmert et al. 
2007], which strictly dominates the admissible landmark heuristics [Helmert and 
Domshlak 2009]. Moreover, long-distance mutual exclusion constraints based on 
a DTG analysis is proposed and shown to be effective in speeding up SAT-based 
optimal planners [Chen et al. 2009]. The DTG-Plan planner searches directly on 
the space of DTGs in a hierarchical decomposition fashion [Chen et al. 2008]. The 
algorithm is shown to be fast but is not complete or optimal. 

Comparing to the above recent work, POR offers a completely new approach to 
exploit the state-space decomposition in the SAS+ representation. It is orthogonal 
to the design of better heuristics and it provides a systematical, theoretically sound 
way to reduce search costs. 

POR is most effective for problems where the action support graphs are direc- 
tional and the inter-action dependencies are not dense. It may not be useful for 
problems where the actions are strongly connected and there is a high degree of 
inter-action dependencies. For example, it is not useful for the 15-puzzlc where 
each action on each piece is supported by surrounding actions, which makes the 
action support graph strongly connected. In this case, POR cannot give reductions 
during the search. 

9. CONCLUSIONS AND FUTURE WORK 

Previous work in both model checking and Al planning has demonstrated that 
POR is a powerful method for reducing search costs. POR is an enabling technique 
for modeling checking, which will not be practical without POR due to its high 
complexity. Although POR has been extensively studied for model checking, its 
theory has not been developed for Al planning. In this paper, we developed a 
new POR theory for planning that is parallel to the stubborn set theory in model 
checking. 

In addition, by analyzing the structure of actions in planning problems, we de- 
rived a practical criterion that defines left commutativity between actions. Based 
on the notion of left commutativity, we developed sufficient conditions for finding 
stubborn sets during search for planning. 

Furthermore, we applied our theory to explain two previous POR algorithms for 
planning. The explanation provided useful insights that lead to a stronger and 
more efficient POR algorithm called SAC. Compared to previous POR algorithms, 
SAC finds stubborn sets based on a finer granularity for checking left commutativity, 
leading to strong reduction. We compared the performance of SAC to the previously 
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proposed EC algorithm on both optimal and non-optimal state space searches. 
Experimental results showed that the proposed SAC algorithm led to significantly 
stronger node reduction and less overhead. 

In our future work, we plan to develop stronger POR algorithms for planning 
based on our theoretical framework and study its interaction with other search re- 
duction techniques such as preferred operators [Helmert 2006], abstraction heuris- 
tics [Helmert et al. 2007], landmarks [Richter et al. 2008], and symmetry detec- 
tion [Fox and Long 1999]. 
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Tabic I: Comparison of FD, EC, and SAC using A* with h max heuristic on IPC domains. 
We show numbers of expanded and generated nodes. "-" means timeout after 300 seconds. 
For each problem, we also highlight the best values of expanded and generated nodes 
among three algorithms, if there is any difference. 
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Tabic II: Comparison of FD, EC and SAC with no-preferred operators on IPC's domains. 
We show numbers of expanded and generated nodes. "-" means timeout after 1800 seconds. 
For each problem, we also highlight the best values of expanded and generated nodes 
among three algorithms if there is any difference. 
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